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Reconstructing galaxy fundamental distributions and 
scaling relations from photometric redshift surveys. 
Applications to the SDSS early-type sample 
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\ ABSTRACT 



Or 



Noisy distance estimates associated with photometric rather than spectroscopic 
redshifts lead to a biased estimate of the luminosity distribution, and produce a cor- 
related mis-estimate of the sizes. We consider a sample of early- type galaxies from 
the SDSS DR6 for which both spectroscopic and photometric information is available, 
and apply the generalization of the V max method to correct for these biases. We show 
that our technique recovers the true redshift, magnitude and size distributions, as well 
as the true size-luminosity relation. We find that using only 10% of the spectroscopic 
£N) ■ information randomly spaced in our catalog is sufficient for the reconstructions to be 

' accurate within ~ 3%, when the photometric redshift error is Sz ~ 0.038. We then 

address the problem of extending our method to deep redshift catalogs, where only 
photometric information is available. In addition to the specific applications outlined 
here, our technique impacts a broader range of studies, when at least one distance- 
dependent quantity is involved. It is particularly relevant for the next generation of 
' surveys, some of which will only have photometric information. 

O ' 

Q\ \ Key words: distance scale - galaxies: distances and redshifts - methods: statistical 

. - galaxies: formation — catalogues - survey - galaxies: fundamental parameters - 

cosmology: observations. 
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1 INTRODUCTION metric surveys (e.g. DES, LSST, SNAP, JDEM), which will 

, . .„ ......... r , . , increase the number of galaxies with multi-band photometry 

Ihe redshiit and luminosity distributions ol galaxies, and r , .... 

to a tew billions. 

galaxy scaling relations, such as the color-magnitude re- 
lation, the size-surface brightness relation, the luminosity- Photometric information is essential and statistically 
size relation or the Fundamental Plane, play a crucial role more significant for studying cosmological evolution at a 
in constraining galaxy formation models. However, a bias fraction of the cost of a full spectroscopic survey. There- 
will be intrinsically present in all these correlations if the fore, many efforts are currently devoted to improve photo- 
transformation from observable to physical quantity involves metric redshift estimations (see for example Feldmann et al. 
one or more distance-dependent observables, due to noise 2006; Carliles et al. 2008; Hildebrandt et al. 2008; Oyaizu 
in the distance estimate. Distances are only known approx- et al. 2008a,b; Stabenau et al. 2008; Budavari 2009; Ilbert 
imately if photometric redshifts are available, but spectro- et al. 2009; Jouvel et al. 2009; Salvato et al. 2009), espe- 
scopic redshifts are not. This is already the case of many cur- cially because accurate photometric redshifts are among the 
rent surveys (e.g. SDSS, COMBO-17, MUSYC, COSMOS, key requirements for precision weak lensing measurements 
CFHTLS), where the number of objects with photometric (Banerji et al. 2008; Ma & Bernstein 2008; Mandelbaum et 
redshifts is more than an order of magnitude bigger than al. 2008). Well-understood photometric redshifts and errors 
that of spectroscopic redshifts, and will be increasingly true are also vital in resolving redshift ambiguities where spec- 
of the next generations of deep multicolor wide-area photo- troscopy shows only a single spectral line (Lilly et al. 2007), 

and are especially crucial to dark energy science (Bernstein 
& Huterer 2009; Sun et al. 2009). 
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Hence, methods for recovering unbiased estimates of the 
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redshift distribution (Padmanabhan et al. 2005; Sheth 2007; 
Lima et al. 2008), the luminosity function (Sheth 2007), and 
galaxy scaling relations (Rossi & Sheth 2008) from magni- 
tude limited photometric redshift datasets are indeed nec- 
essary. In particular, in Rossi & Sheth (2008) we described 
two techniques which can handle this complication (i.e. a 
non-parametric deconvolution method and a maximum like- 
lihood approach), and the extension of the V max algorithm 
(Lucy 1974) was tested on a mock catalog. Here we apply 
the same method to the SDSS DR6, and investigate the 
bias present in the fundamental distributions and in the 
luminosity-size relation for early-type galaxies, which arises 
when computing these relations from photometric data. The 
technique is insensitive to the actual quality of photo-z es- 
timates, but it relies on the knowledge of the conditional 
probability p(z p hoto | Zspoctro). In essence, if photo-z errors are 
at least known, our technique is applicable. 

We have two main goals in this study. The first is to 
use a selected sample of early types from the SDSS DR6, for 
which both photo-zs and spectro-zs are known, and apply 
our deconvolution technique to derive the unbiased redshift, 
magnitude and size distributions, and the magnitude-size re- 
lation. We refer to this procedure as the "calibration" part. 
The second and more challenging goal is to use our calibra- 
tion in order to infer information when spectroscopic data 
is poor or not available (i.e. deep redshift catalogs). 

The outline of the paper is as follows. Section 2 de- 
scribes the SDSS galaxy catalog used in this analysis, and 
highlights the criteria adopted for the early-type selection. 
Section 3 presents the reconstruction of the redshift, mag- 
nitude, size, and size-magnitude distributions from photo- 
metric data, for the early-type sample. A brief summary 
of the deconvolution method is provided - while we point 
out in an appendix the relation between our deconvolution 
procedure and a convolution-based approach -, the depen- 
dence of p(z p hoto|z S pcctro) on magnitude is also discussed, as 
well as other technical details. Section 4 deals with extend- 
ing our technique when spectroscopic information is poor, or 
when only photometry is available. Some tests are performed 
on the early-type "calibration" catalog, and in particular it 
is found that, by using only 10% of the spectroscopic in- 
formation randomly spaced in redshift space, one can re- 
construct accurately the galaxy fundamental distributions. 
Finally, Section 5 summarizes our findings, and indicate on- 
going and future studies and applications. 

Whenever necessary, we assume a spatially flat cosmo- 
logical model with (Qm,£Ia,Ii) = (0.3,0.7,0.7), where Om 
and Qa are the present day densities of matter and cosmo- 
logical constant scaled to the critical density, and write the 
Hubble constant as Ho = 100 h km s _1 Mpc -1 . 



2 THE SDSS EARLY- TYPE SAMPLE 

The catalog we use is based on the Sloan Digital Sky Sur- 
vey (SDSS) Data Release 6 (DR6, http : / /www. sdss.org/), 
available online through the Catalog Archive Server Jobs 
System (CasJobs). We adopt selection criteria suitable to 
early-type galaxies, as described in Bernardi et al. (2003). 
Specifically, from the DR6 galaxy photometric sample (Pho- 
toObj All in the Galaxy view, which contains primary objects 



that are classified as galaxies), we select objects according 
to these general criteria: 

• Petrosian magnitudes in the range 14.50 < m < 17.45 
for the r band. 

• Concentration index i? pe tro,9o/-Rpetro,50 > 2.5 in the i 
band. 

• Likelihood of the de Vaucouleur's model > 0.8. 

• Objects with both photometric and spectroscopic red- 
shifts available. 

No redshift or velocity dispersion cuts were made, although 
we tested the effect of a velocity dispersion cut (a > 0, so 
good S/N) and found no substantial difference. Our cata- 
log contains 163,718 objects, and consists of model magni- 
tudes, petrosian radii, de Vaucouleurs and exponential fit 
scale radii along with their corresponding axis ratios in the 
r band, photometric and spectroscopic redshifts and their 
quoted errors. 

Model magnitudes are obtained by measuring galaxy 
fluxes through equivalent apertures in all bands, and by fit- 
ting the exponential or de Vaucouleurs model of higher like- 
lihood in the r filter and applying it in the other bands, after 
convolution with a PSF in each band (for more details, see 
Blanton et al. 2003). The previous fitting procedures yield 
also the effective radii of the models and the axis ratio of 
the best fit model. In particular, the Petrosian ratio Rp at 
a radius r from the center of an object is defined to be the 
ratio of the local surface brightness in an annulus at r to the 
mean surface brightness within r, as described by Blanton 
et al. (2001) and by Yasuda et al. (2001). The Petrosian ra- 
dius rp is the radius at which Rp(rp) equals some specified 
value .Rpjim, set to 0.2 in our case. 

We select photometric redshifts from the SDSS Photoz 
table. This set of photometric redshifts has been obtained 
with the template fitting method, which simply compares 
the expected colors of a galaxy (derived from template spec- 
tral energy distributions) with those observed for an indi- 
vidual galaxy. The empirical templates of Coleman, Wu & 
Weedman (1980), extended with spectral synthesis models, 
are used. These templates were adjusted to fit the calibra- 
tions, as explained in Budavari et al. (2000). More detailed 
information about the photo-z catalog used here is also pro- 
vided in Csabai et al. (2003), and references therein. The 
main advantage of this technique in computing photo-zs is a 
broader redshift range coverage for all types of galaxies, and 
the additional information like spectral type, K-correction 
and absolute magnitudes. However, its accuracy is severely 
limited by the lack of perfect spectral energy distribution 
(SED) models. In fact, the quality of photometric redshift 
estimation of faint objects (or with large photometric errors) 
is weak. More generally, the standard scenario for template 
fitting is to take a small number of spectral templates and 
choose the best fit by optimizing the likelihood of the fit 
as a function of redshift, type and luminosity. Variations of 
this approach have been developed in the last few decades. 
For example, in the recent SDSS DR7 photometric redshifts 
are obtained with a hybrid method, namely a combination 
of the template fitting procedure and of a technique which 
compares the observed colors of galaxies to a reference set 
that has both colors and spectroscopic redshifts observed. 

We finally cross-correlate the photometric information 
with the SDSS DR6 spectroscopic sample (SpecObjAll), 
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Figure 1. Photometric and spectroscopic SDSS redshift distri- 
butions for the early-type sample: effect of the selection criteria. 
Solid lines represent the redshift distributions of the sample con- 
sidered in this study (Sample 1). Dotted lines are the result of 
using only spectra of good quality (Sample 2). Dashed lines de- 
note a more sophisticated selection process, as explained in the 
main text (Sample 3). 



and select only those photometric objects for which spec- 
troscopic information is also available. The spectroscopic 
pipeline (spectrold) assigns a final redshift to each object 
spectrum by choosing the emission or cross-correlation red- 
shift with the highest likelihood and stores this as z in the 
specObj table. In addition to spectral classification based on 
measured lines, galaxies are classified by a Principal Com- 
ponent Analysis (PCA), using cross correlation with eigen- 
templates constructed from the SDSS spectroscopic data. In 
the selection of our sample we use only photometric crite- 
ria, but more robust constraints can be applied in order to 
reduce galaxy-type errors, and their effect is illustrated in 
Figure 1. In both panels, solid lines represent the redshift 
distributions of the calibration sample used in this study 
(Sample 1). However, if in addition we require spectra of 
good quality or without masked regions (SDSS warning flag 
for spectra of low quality set to zero), then the number of 
galaxies drops to 133, 938 (dotted lines in Figure 1, Sam- 
ple 2). Finally, if we consider only spectra with PCA clas- 
sification numbers a < —0.1, typical of early-type galaxy 
spectra (Connolly & Szalay 1999), we find 110,309 objects 
(dashed lines in Figure 1, Sample 3). For all these sam- 
ples, we provide in Table 1 the median spectroscopic and 
photometric redshifts, their corresponding median absolute 
deviations (MAD), the standard normalized median abso- 
lute deviation (NMAD) defined as in Hoaglin et al. (1983) 
by 1.48 x median[\ Az|/(1 + z sp cctro)], and the dispersion 



°"Az/(l + z s 



where Az = z 



spectro 2-photo- 



In our study we consider the sample obtained with the 
photometric-only selection process (Sample 1). This is be- 
cause our second goal is to rely on this "calibration" subset 
to infer information when only photometry is available. 

More sophisticated criteria for obtaining a well- 
controlled sample of early-type galaxies are presented in 
Park and Choi (2005) who used color, color-gradient, and 
concentration index to classify galaxies into early and late 
types with reliability and completeness exceeding 90%. See 
also Hyde & Bernardi (2009), where problems like contam- 
ination by later-type galaxies and systematic effects due to 
the use of Petrosian quantities are addressed in detail. How- 
ever, since we are not attempting to make a precision mea- 
surement of scaling relations, but rather our main goal is to 
show how to correct for photo- z biases, more robust selection 
criteria do not affect the nature of the problems investigated 
in this study. 



Table 1. Median spectroscopic and photometric redshifts, corre- 
sponding median absolute deviations (MAD), standard normal- 
ized median absolute deviations (NMAD) and dispersions for the 
three early- type samples illustrated in Figure 1. 



SAMPLE 1 SAMPLE 2 SAMPLE 3 



median z spC ctro 
median z photo 

MAD Zapectro 

MAD z photo 
NMAD 
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0.1021 
0.1107 
0.0307 
0.0348 
0.0172 
0.0340 



0.1021 
0.1093 
0.0328 
0.0361 
0.0173 
0.0263 



0.1040 
0.1131 
0.0327 
0.0358 
0.0154 
0.0219 



3 DECONVOLUTIONS FROM 
PHOTOMETRIC DATA 

In this section we apply our reconstruction technique based 
on the generalization of the V m ax method (Sheth 2007; Rossi 
& Sheth 2008) and briefly summarized here to the red- 
shift, magnitude and size distributions, and to the size- 
luminosity relation of the early-type sample. A new de- 
convolution code named DeFaST (acronym for Deconvolu- 
tion Fast, with the convention of using capital letters for 
consonants), which performs a fast integral deconvolution, 
has been developed for this study. Lucy's (1974) iterative 
scheme is implemented, in one or two dimensions. Appro- 
priate variations have been carried out in order to han- 
dle correctly different choices of the conditional probabil- 
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ity functions. In particular, for the SDSS early-type sample 
the conditional distributions are measured directly from the 
data and then used in the deconvolution algorithm. Splines 
fits to the pdf's are performed in those cases. For techni- 
cal details we refer the reader to a trial version of the one- 
dimensional software, freely available for download online 
at the web address http : / /www. physics. upenn.edu/ ~ 
grossi/ research/ software.htm. 

3.1 Extended Vmax method as a non-parametric 
deconvolution-like technique 

The Vmax method, originally devised by Schmidt (1968), is 
a way of testing the uniformity of spatial distributions. In 
particular, if V is the comoving volume between an object in 
a flux-limited catalog at redshift z and the observer located 
at z = 0, and V m ax is the corresponding total survey volume 
over which the same object could have been seen at z max , 
then clearly the ratio V/V max is a measure of the position of 
the source. Therefore, if a distribution is uniform then the 
average value over all the objects in the catalog, (V/Vnax), 
must be 0.5. The V ma x method allows one to correct for se- 
lection effects present in magnitude-limited datasets, where 
fainter objects are seen only at closer distances. In fact, in or- 
der to properly estimate for example the luminosity function 
one must sum over all the objects in the catalog and weight 
each source separately by the inverse of V m ax (or the inverse 
of Knax — Vmin if the catalog is limited at both ends). This 
is the basis of the procedure developed by Schmidt (1968). 

Generalizations of this method to include distance er- 
rors have been carried out in Sheth (2007) for the luminosity 
function (ID case), and in Rossi & Sheth (2008) for galaxy 
scaling relations (2D case, or full n-dimensional manifold). 
To summarize, in a flux-limited survey the quantities af- 
fected by the photometric redshift errors are the intrinsic 
luminosity distribution rather than the luminosity function 
itself (which differs from the previous one by the inclusion of 
a 1/Vmax weighting), and the intrinsic joint distribution of 
luminosities and sizes - or in general the joint distribution 
of two (or more) observables affected by the same distance 
errors. However, in a real experiment one measures their 
noisy counterparts. Therefore it is necessary to reconstruct 
the intrinsic distributions first, before applying the Vmax pre- 
scription. This is achieved by recognizing the deconvolution 
nature of this class of problems, hence an iterative algorithm 
is suitable for the reconstructions. 

In fact, adopting Lucy's (1974) formalism, the general 
n-dimensional problem is that of estimating the frequency 
distribution ^(£) of the intrinsic n-dimensional vector £ 
when the available observed measures, denoted by the vec- 
tor x, are a finite sample drawn from an infinite population 
characterized by 

*(x) = J *(£) P {x\i) at, (l) 

where $(x) is the data function accessible to measurements 
and p(x\£) is the conditional probability of estimating x 
when the true value is £. The iterative procedure to invert 
the previous expression is 

* r+1 (£) = * r (0 Jdx |^ p(x|0, (2) 



where 

$» = Jdz * r (€)pNO- (3) 

The index r indicates the rth iteration in the sequence of es- 
timates, and $ is an approximation to $ obtained from the 
observed sample. The starting value ^(i;), which initial- 
izes the iteration, should be a smooth, non-negative function 
having the same integrated density as the observed distri- 
bution. In our deconvolution procedure, we always use the 
observed histograms (i.e. photo-z derived distributions) as 
convenient starting guesses. 

Clearly, the outlined formalism is readily applicable to 
the size-luminosity correlation if we interpret x as the 2D 
vector of the estimated absolute magnitudes and sizes, and £ 
as the vector of the corresponding true (or intrinsic) quanti- 
ties. Similarly for the redshift, magnitude and size distribu- 
tions, where now vectors simply reduce to scalar quantities 
(ID case). 

3.2 Redshift distribution 

Following Rossi & Sheth (2008), we indicate with £ and 
z the photometric and spectroscopic redshifts, respectively. 
As argued before, the problem of estimating the intrinsic 
redshift distribution N(z) - number of objects which lie at 
redshift z - is best thought of as a deconvolution problem, 
and if p(C\z) is the probability of estimating the redshift as 
C when the true value is z, then the distribution of esti- 
mated redshifts is Af(Q = j N(z) p((\z) dz. Note that this 
is just a particular ID case of equation (1), where £ — > z, 
x — > C, $ — > N and * — > N. If p((\z) is known and 7V«) 
is measured, then the previous relation is a Fredholm equa- 
tion of the first kind, easily solvable with a one-dimensional 
inversion algorithm. Before showing the reconstructed in- 
trinsic redshift distribution, we focus our attention on the 
conditional probability p{C,\z). In reality, this distribution 
does depend weakly on apparent magnitude. To illustrate 
the effect, we split our early- type sample in three intervals 
of apparent magnitudes, approximately spaced in bins of 
1.5 magnitude width, i.e. 13.81 ^ m < 15.31 (solid lines 
in Figure 2), 15.31 < m < 16.81 (dotted lines in Figure 
2), 16.81 < m < 18.27 (dashed lines in Figure 2). In the 
left panel of Figure 2, contours show levels which are 1/2™ 
times the height of the maximum value of the density of 
sources, with n running from 1 to 5, for the three magnitude 
bins. In the same figure, the right panel shows an example 
of p{C\z, m) for each of the three bins in magnitude and a 
spectroscopic redshift bin centered 0112 = 0.0572. 

Neglecting this small dependence of magnitude does not 
affect the reconstruction of the intrinsic redshift distribu- 
tion significantly. Results are shown in Figure 3, where in 
the left panel we compare ( and 2, whereas in the right 
panel we show the photometric or observed redshift distri- 
bution (dotted line), the spectroscopic or intrinsic distribu- 
tion (solid line) and its reconstruction after a few iterations 
(dashed line) , obtained by applying the one-dimensional de- 
convolution algorithm based on the Lucy's (1974) inversion 
technique. The error distributions used in the reconstruction 
are inferred directly from the SDSS early-type data. The me- 
dian redshift of the spectroscopic sample is 0.1021 (see Table 
1), while the median redshift of the deconvolved photo- 2 dis- 
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Figure 2. Apparent magnitude dependence of the conditional probability p((\z). [Left panel] Contours show levels which are 1/2" times 
the height of the maximum value of the density of sources, with n running from 1 to 5, for the three magnitude bins specified in the 
figure. [Right panel] Example of p(f — z\z,m) for z = 0.0572 and for the three previous bins in magnitude. 



tribution is 0.1002. This value is calculated as follows. We 
find the bin which divides the reconstructed spectroscopic 
distribution in two roughly equal area parts. Within that 
bin, we then interpolate with splines and provide the exact 
value of z for which the area of the reconstructed distribu- 
tion is split into two equal parts. 

Accurately characterizing p(C|z) is necessary for a reli- 
able deconvolution. After testing different methods, we have 
achieved good results using cubic splines and found that 
simple Gaussian fits provide unsatisfactory mapping of the 
conditional distributions (see also Section 4.3). 



3.3 Magnitude distribution 

The previously outlined dependence of p(C\z) on appar- 
ent magnitude suggests that one should expect, similarly, 
a redshift dependence in the corresponding magnitude con- 
ditional distributions. Therefore, it may appear difficult to 
characterize and measure the appropriate conditional prob- 
abilities, and apply the one-dimensional deconvolution algo- 
rithm to reconstruct the magnitude distribution. However 
the problem is simpler, as we show with the following al- 
gebra. Let M denote the true absolute magnitude and M 
that estimated using £ rather than z. Use Dl(z) to denote 
the luminosity distance, and (j)(M) to indicate the number 
density of galaxies with absolute magnitudes M. Evolution 
is neglected. Let Vmax denote the largest comoving volume 
out of which an object of absolute magnitude M can be 
seen, and V m m the analogous if the catalog is also limited at 
the lower end. The (true) number of galaxies with absolute 
magnitude M (i.e. the intrinsic luminosity distribution) is: 



and the total number of objects with estimated absolute 
magnitudes M is: 



M{M) = I AM 0(Af ) e[V max (M), V m m(M), M, M] (5) 

e[F ma x(Af), V min (M), M, M] 



AM N(M) 



[Knax(M) - V min (M)] ' 



where 



D L (Vmax) 



dVcc 



9 = / AD L p(M - M\M, D L ). (6) 

•>£> L (v min ) dL>h 

Note that since V max and V m i n are known functions of M, 
itself is just a complicated function of M and A4. Dividing 
(6) by [V ma x(M) - VWn(M)] yields: 



9 



[Vmax Vmin] 



[ V max V minj 

AD L p(D L ) p(M ~M\M,D L ) 

= p(M - M\M) = p(M\M). (7) 

Therefore, equation (5) becomes a simple one-dimensional 
deconvolution, namely: 



N[M) = / N(M) p{M\M) AM. 



N{M) = c/>{M)[V max (M) ~ Vmin(M)], 



(4) 



(8) 



The above expression (8) is again another particular ID 
case of (1), where now £ — > M, x — > M, 9 — * M and 
$ — > N. Hence, by measuring the conditional probability 
p(M\M) from the catalog, it is possible to apply directly the 
one-dimensional deconvolution algorithm - as in the previ- 
ous section. Note that this conclusion is particularly rele- 
vant when attempting to reconstruct the luminosity func- 
tion from photometric data. Results of this applications are 
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Figure 3. [Left panel] Distribution of spectroscopic and photometric redshifts in our SDSS early-type catalog. Contours as in Figure 
2. [Right panel] Observed, intrinsic and reconstructed redshift distributions. The dotted histogram was used as a starting guess for the 
one-dimensional deconvolution algorithm. Convergence is achieved after a few iterations. 




Figure 4. [Left] Distributions of intrinsic and estimated absolute magnitudes in the SDSS early-type catalog, which result from the 
differences between spectroscopic and photometric redshifts shown in the previous figure. [Right] One-dimensional reconstruction of the 
intrinsic absolute magnitude distribution from the distribution of estimated redshifts. Dotted histogram shows the observed absolute 
magnitude distribution, used as a starting guess. Jagged line is the reconstructed intrinsic distribution, after 10 iterations. 



shown in Figure 4, where the left panel compares M and 
M, while the right panel shows the one-dimensional recon- 
struction after 10 iterations (jagged line) of the intrinsic dis- 
tribution of absolute magnitudes (solid histogram). The ob- 
served distribution of M (dotted histogram) was used as 
a convenient starting guess in the deconvolution algorithm. 



We use model magnitudes in the r band, corrected for red- 
dening and extinction, and assume a standard cosmology 
in the conversion from apparent to absolute magnitudes - 
but we neglect evolution and K-corrections. In particular, 
while K-corrections are necessary to properly characterize 
the absolute magnitude distribution, in this paper we do not 
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Figure 5. [Upper left] Distribution of seeing-corrected effective 
angular sizes of galaxies in our early-type SDSS sample. [Upper 
right] Corresponding distribution of axis ratios b/a. [Lower left] 
Effective angular sizes r^cV as a function of the axis ratio b/a. 
[Lower right] Distribution of effective circular radii r deV , as de- 
fined in the main text. 



apply them to our de-reddened model magnitudes because 
the only purpose of our work is to describe a deconvolution 
technique, which is independent of those corrections. 

3.4 Size distribution 

As for the magnitude distribution, it is possible to re- 
construct the intrinsic distribution of sizes with a one- 
dimensional deconvolution algorithm (for more details, see 
also Section 3.3 in Rossi & Sheth 2008). In what follows, 
we use R to denote log 10 of the physical size, and TZ to de- 
note the estimated size based on the photometric redshift 
f . We apply one correction to convert the (seeing-corrected) 
effective angular radii, rd c v, output by the SDSS pipeline to 
physical radii. Following Bernardi et al. (2003) , we define the 
equivalent circular effective radius ro = \fbfa r'dev, where 
b/a is the corresponding axis ratio of the de Vaucouleurs 
radius. Then R — log 10 [roDj J (z)/(l + z) 2 ], and similarly 
TZ — log 10 [ro-DL(0/(l + C) 2 ]- We do not apply a second cor- 
rection, analogous to the K-correction we would ideally have 
applied to the magnitude of each galaxy, to account for the 
fact that galaxies appear slightly larger in the bluer bands 
(i.e. Hyde & Bernardi 2009). Figure 5 shows the distribution 
of seeing-corrected effective angular sizes of galaxies in our 
SDSS early-type sample (upper left panel), the correspond- 
ing distribution of axis ratios b/a (upper right panel), the 
effective angular sizes r<j e v as a function of axis ratio b/a 
(lower left panel) , and the distribution of equivalent circular 
effective radii ro.dev (lower right panel). 

In analogy with the magnitude case, we think of Af(TZ), 



the number of observed objects with estimated TZ, as be- 
ing a convolution of the true number of objects with size 
R, N(R), with the probability that an object with size R is 
thought to have size TZ. We measure p(TZ\R) directly from 
the catalog and run the one-dimensional deconvolution al- 
gorithm, the result of which is presented in Figure 6. The 
left panel compares TZ and R, and the right panel shows the 
one-dimensional reconstruction (jagged line). The intrinsic 
distribution of physical sizes (solid line) is recovered after a 
few iterations, when the observed distribution of A4 (dotted 
line) is used as a convenient starting guess in the inversion 
algorithm. Note that although the difference between the in- 
trinsic and the observed distribution is small, this departure 
will suffice to bias the size-luminosity relation - as we show 
next. 

3.5 Size-magnitude correlation 

Photometric redshift errors broaden both the magnitude 
and size distributions, as evident from Figures 4 and 6, but 
changes to the estimated absolute magnitudes and sizes are 
clearly not independent. These correlated changes have an 
important effect on the size-luminosity relation, even when 
the broadening of one of the two distributions is not se- 
vere. This is the case of our SDSS sample, where the size 
distribution (Figure 6) is not severely biased, but the size- 
luminosity relation is still biased. In fact, in our SDSS cat- 
alog (TZ\M) cx -0.226, whereas (R\M) oc -0.257, as shown 
in Figure 7. 

Reconstructing an unbiased estimate of size and lumi- 
nosity from photometric data is also best thought of as a 
non-parametric two-dimensional deconvolution (again, see 
Rossi & Sheth 2008) , and application of the extended Vmax 
2D algorithm to the SDSS early-type sample is presented in 
Figure 7, where it is shown that the use of photo-zs intro- 
duces a bias in the size-luminosity relation (shallower slope 
in panel on left). Contours and solid lines indicate respec- 
tively the TZ — M relation associated with photo-z (left), 
and the expected intrinsic R — M relation (right). Squares 
in left panel show the binned starting guess for the two- 
dimensional deconvolution algorithm (obtained from photo- 
metric information), triangles in right panel show the re- 
sult after 7 iterations and circles are the expected binned 
intrinsic relation, obtained from spectroscopic information. 
Convergence to the true solution is clearly seen. 

As pointed out in Hyde & Bernardi (2009), most of the 
scaling relations for early types show evidence for curva- 
ture. In this respect, our technique also accounts for it, as 
we reconstruct intrinsic relations within each bin, without 
performing any fits. 



4 EXTENSIONS TO DEEP REDSHIFT 
CATALOGS 

4.1 Challenges 

If we derive galaxy fundamental distributions and scaling 
relations using only photometry, a bias will be intrinsically 
present - as was shown in the previous section. With our 
deconvolution procedure we can account and correct for 
it. However, our V ma x reconstruction method assumes that 
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Figure 6. [Left panel] Distributions of intrinsic and estimated physical sizes in the SDSS early-type catalog. [Right panel] One-dimensional 
reconstruction of the intrinsic size distribution from the distribution of estimated redshifts. Dotted histogram shows the observed size 
distribution, used as a starting guess. Jagged line shows the reconstructed intrinsic distribution after 10 iterations. Line styles same as 
Figure 4. 




Absolute Magnitude 



Figure 7. Effect of photo-zs on the size-luminosity correlation in our SDSS early-type catalog. In the left panel, contours and solid line 
show the 1Z — M relation associated with photo-zs, whereas the right panel shows the intrinsic R — M relation. Note the bias (shallower 
slope in panel on left) which results from the fact that the photo-z distance error moves points down and left or up and right on this 
plot. Squares in left panel show the binned starting guess for the 2D deconvolution algorithm, triangles in right panel show the result of 
reconstruction after 7 iterations. Circles are the expected binned intrinsic relation, obtained from spectroscopic information. Convergence 
to the true solution is clearly seen. 
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Figure 8. Measurements of the conditional probabilities p(£ — z|z)'s for five different spectroscopic redshift bins, as indicated in the 
panels. Dotted lines are spline fits to the pdf's when the full spectroscopic information is used; long-dashed lines are spline fits when 
only 15% [upper panel], 10% [middle panel], or 5% [lower panel] of the spectral information is used from the original early-type catalog. 



the distribution of photo-z errors is known accurately. This 
means that spectroscopic redshifts are available for a subset 
of the data, as it happens with our SDSS "calibration" sam- 
ple. Suppose now that we only have limited spectroscopic 
data available. Can we still correct for the bias in a reliable 
way, using the information contained in the spectroscopic 
"calibration" set? 



tra in redshift space. This rather conservative choice will 
guarantee an accurate reconstruction of the intrinsic rela- 
tions. Cross-correlations with other surveys may also provide 
enough reliable information to specify p(C,\z) accurately. 



There are essentially two nontrivial issues to this end. 
As we pointed out in Rossi & Sheth (2008), one concern is 
as to whether or not the number of spectra which must be 
taken to specify the error distribution reliably is sufficient to 
also provide a reliable spectroscopic estimate of these fun- 
damental distributions and scaling relations. In this case, 
the basis for deciding that it is worth reconstructing these 
relations from photo-z data is not clear. However, as long 
as the spectroscopic sample spans the entire range of pho- 
tometric observables, we show in the next subsection that a 
detailed knowledge of the photo-z error distribution (Figure 
8) can be inferred with only 10% of randomly spaced spec- 



The second problem is more challenging. If the spec- 
tra are not simply a random subset of the magnitude lim- 
ited photometric sample, then it may be difficult to quan- 
tify and so correct for the selection effects associated with 
the spectroscopic subset. In particular, if the spectroscopic 
sub-sample does not span the entire range of photometric 
observables, it is problematic to perform the reconstruction. 
However, in this situation one may rely on photo-z error es- 
timates for the range where spectral information is missing, 
and still apply the deconvolution procedure. We investigate 
this idea further in the second subsection, and discuss its 
applicability and limitations. 
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Figure 9. Reconstructions of the intrinsic redshift distribution (long-dashed lines) when only 15% [left panel], 10% [middle panel], or 
5% [right panel] of the original spectral information is used. For comparisons, the reconstruction performed with the full early-type 
spectroscopic information is also plotted ("fiducial" distribution), with jagged dot-dashed lines. The solid and dotted histograms are the 
redshift distributions of the early-type spectroscopic and photometric catalogs, respectively. 



4.2 How many spectra? 

The accuracy in the reconstruction of the intrinsic redshift 
distribution depends both on the quality of photometric red- 
shifts, and on the size of the calibration sample with spectral 
information. To study how this accuracy depends on the size 
of the calibration sample, we consider the early-type "cal- 
ibration" catalog and randomly remove 85%, 90%, or 95% 
of the available spectroscopic information, respectively. By 
this we mean that we are picking random 15%, 10% or 5% 
from the apparent magnitude limited sample. We refer to 
this part as the "degradation" of the catalog. We then mea- 
sure the p(£[z)'s conditional distributions for five different 
spectroscopic redshift bins, and compare them with those 
computed when the full spectroscopic information is avail- 
able. Figure 8 is the result of this test. Dotted lines in all 
the panels are the error probability functions measured in 
different bins when all the spectral information is available; 
long-dashed lines represent the cases when the catalog is 
degraded to 15%, 10%, or 5% of its original size. 

We now ask how accurately we can reconstruct the in- 
trinsic redshift distribution using these sub-samples, when 
the error in the photometric redshift is given as in Figure 
8 (the root-mean-square (RMS) 5z — z photo — sp ectro is 
typically ~ 0.038). To this end, we use those "degraded" 
error probabilities to recover the intrinsic redshift distri- 
bution from photometric data, the result of which is dis- 
played in Figure 9. Jagged dot-dashed lines show the re- 
construction when the full catalog is used, and long-dashed 
lines are results of the deconvolutions performed with lim- 
ited random spectroscopic subsets. We quantify the scat- 
ter/convergence of the deconvolution procedure in Figure 10, 
where we plot the difference (within each bin) between the 
reconstructed intrinsic distributions obtained when partial 
versus full spectroscopic information (i.e. "fiducial" distribu- 
tion) is used, normalized by the maximum value of the re- 
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Figure 10. Convergence study of the reconstructed solutions pre- 
sented in Figure 9, for different spectroscopic "degradation" lev- 
els. The root mean square fluctuations of the difference expressed 
in the j/-axis are also given, in the redshift range 0.02 sj z $C 0.22 
denoted by the horizontal arrow, as explained in the main text. 



constructed fiducial distribution. Within the figure, we pro- 
vide the RMS fluctuations of the difference, in the redshift 
range 0.02 ^ z 0.22 marked by the horizontal arrow in 
the panel. We find that a safe and reliable reconstruction 
is guaranteed when the spectral information is restricted up 
to 10% of its original size (i.e. ~ 3% scatter), for randomly 
spaced data in redshift space. 
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Figure 11. Measurements of the conditional probabilities p(£ — z|z)'s for five different redshift bins, as indicated in the panels. Dotted 
lines are spline fits to the pdf's (the full early-type catalog is used), solid lines are unbiased Gaussian approximations with widths 
determined by quadratically averaging the SDSS photo-z quoted errors within each redshift bin. 




Figure 12. Reconstruction of the redshift distribution when 
Gaussian approximations are used for the pdf's (long-dashed 
line), as opposed to the case when the pdf's are measured di- 
rectly from the data (jagged dot-dashed line). 

4.3 Can we use Gaussian approximations? 

Suppose now that we are missing spectroscopic information 
in some redshift interval, but that we have photometry avail- 
able along with photo-z quoted errors in that range. Can 
we still use our deconvolution technique? In principle, our 
method is always readily applicable, provided the knowledge 
of p((\z). How can we infer it, given the lack of spectral in- 
formation? In effect, the reconstruction of the intrinsic red- 
shift distribution depends not only on the calibration sam- 
ple size and on the size of the photometric redshift error 
Sz, but also on the shape of the distribution Sz. Without 
relying on other surveys which may cover missing spectral 
area, we would need to make some assumptions on its shape. 



The easiest solution is to derive the conditional distributions 
p(£|z)'s by using quoted photo-z errors. Specifically, one may 
assume the p(£|z)'s to be unbiased (i.e. £ = z) Gaussians, 
with widths cr Zj ; determined by quadratically averaging the 
SDSS photo-z quoted errors within each redshift bin. The 
question arises as to whether this is a good approximation 
or not, since there is no a priori reason for the error distribu- 
tions to be Gaussian and unbiased (Oyaizu et al. 2008). We 
test this idea on the early-type "calibration" catalog, and 
the result is displayed in Figure 11. In each panel, we show 
with dotted lines spline fits to the error probability func- 
tions measured from the early-type catalog (all the spec- 
tral information is used in this case), and with solid lines 
their unbiased Gaussian approximations. As it appears ev- 
ident from the figure, Gaussian approximations are almost 
always reasonable fits to the data (excluding small tail de- 
partures or catastrophic photo-z failures), but with the ex- 
ception of the redshift interval z spcc tro = 0.1727, where the 
Gaussian fit is rather poor. Unfortunately, this departure is 
sufficient to make the overall reconstruction of the intrin- 
sic redshift distribution problematic. In Figure 12 we show, 
with long-dashed lines, the outcome of the deconvolution 
algorithm when unbiased Gaussian fits are assumed for the 
pdf's, and report again for comparison the accurate recon- 
struction (jagged dot-dashed line), as described in Section 
3.2. The deconvolution is critical particularly in the redshift 
interval where the Gaussian approximation fails (i.e. around 
2s P ectro = 0.1727), and the overall result is of a poor recovery 
of the intrinsic relation. Therefore, an accurate knowledge of 
the distribution of photo-z errors is essential for our method 
to work. 

Nevertheless, improving photo-z uncertainties may help 
to characterize p(C,\z) more accurately. It is also worth notic- 
ing that, in complete absence of spectroscopic counterpart, 
we are in principle still able to apply our technique, pro- 
vided that we have a physically motivated model for the 
conditional error distribution. This is the real power of this 
method. On the opposite, the weighting technique proposed 
by Lima et al. (2008) , which addresses similar goals, always 
requires the spectroscopic sub-sample to span the entire 
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range of photometric observables covered by the photomet- 
ric sample. 



5 DISCUSSION 

Using a selected sample of early-type galaxies from the SDSS 
DR6, for which both photo-zs and spectro-zs are known, we 
applied our one- and two-dimensional deconvolution tech- 
niques (Sheth 2007; Rossi & Sheth 2008) to reconstruct the 
unbiased redshift, magnitude and size distributions, as well 
as the magnitude-size relation (Section 3). 

This is a novel approach in recognizing that theoretical 
predictions, such as the difference between photometric and 
spectroscopic distance estimates, can be represented as in- 
tegral equations and solved using deconvolution techniques. 
In the past, these techniques have been used only observa- 
tionally, for instance in handling the PSF of telescopes. 

We showed that our technique recovers all the true dis- 
tributions and the joint relation, to a good degree of accu- 
racy. We discussed the magnitude dependence of the error 
conditional probabilities (Section 3.2), and argued that the 
problem of reconstructing the true magnitude or size distri- 
bution is best thought of as a one-dimensional deconvolution 
problem (Sections 3.3 and 3.4). We showed that even if the 
distribution of physical sizes is not severely biased, a signif- 
icant bias in the magnitude distribution suffices to compro- 
mise the size-luminosity relation (Section 3.5). We used our 
2D deconvolution technique to correct for this effect. 

We then discuss how to extend our procedure to deep 
redshift catalogs, where limited spectroscopic information, 
or only photometric data, is available (Section 4). For this 
part, we performed two tests using the early-type "calibra- 
tion" sample. We found that using only f 0% of the spectro- 
scopic information randomly spaced in our catalog is suffi- 
cient for the reconstructions to be accurate with about 3% 
scatter, when the error in the photometric redshift is typ- 
ically 5z ~ 0.038. We also showed that assuming unbiased 
Gaussians for the p((\z)'s distributions, with widths deter- 
mined by quadratically averaging the SDSS photo-z quoted 
errors within each redshift bin, is not always a good ap- 
proximation. However we argued that, provided one has a 
detailed knowledge of the pdf from other surveys or from 
empirically motivated models (see for instance van der Wei 
ct al. 2009), our technique can still be used, even when the 
spectroscopic sample does not span the entire range of pho- 
tometric observables covered by the photometric sample. 

We address in more detail the problem of handling 
photo-zs when spectroscopic information is missing (for ex- 
ample using a "blind" deconvolution approach) in a forth- 
coming publication, where we also apply our technique to 
reconstruct the luminosity function in deep redshift cata- 
logs such as the MegaZ-LRG (Collister et al. 2007). 

Even though our discussion was mainly phrased in 
terms of fundamental distributions and scaling relations for 
early-type galaxies, so it may be useful for detailed studies of 
early- types (for example van den Bosch & van de Ven 2008; 
Bernardi 2009), the method developed here is quite general 
and can be applied to recover any intrinsic correlations be- 
tween distance-dependent quantities (even for n-correlated 
variables) ; potentially, it can impact a broader range of stud- 
ies, when at least one distance-dependent quantity is in- 



volved. In fact, our algorithms can be readily adapted to 
study the luminosity function in relatively shallow peculiar 
velocity surveys with noisy Fundamental Plane or D n — a 
distance estimates (Faber et al. 2007; Tully et al. 2009), or 
to handle correctly uncertainties in supernova measurements 
(Krauss et al. 2007; Frieman et al. 2008; Sako et al. 2008), 
which can bias the redshift-dependent equation of state (Bri- 
dle & King 2007; Fosalba & Dore 2007). 

A variety of other correlations can be re-analyzed along 
these lines (see for example Saracco et al. 2009), such as 
the R — L relation for blue galaxies (Melbourne et al. 2007), 
the photometric Fundamental Plane (Bolton et al. 2007), 
and also correlations that do not involve luminosity, such as 
the Kormendy (1977) relation. Other possible applications 
involve quasars (Croom et al. 2009; Richards et al. 2009), 
black-hole M — L correlations, correlations with environ- 
ment, and potentially future baryonic acoustic oscillation 
and dark energy surveys. 
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APPENDIX A: DIRECT RECONSTRUCTIONS 

A widely used method for extracting the redshift probability 
distribution function (PDFz) from a x 2 minimization is the 
following (see for example Bolzonella et al. 2000). A x' 2 is 
formed, typically as 



IN f 



r obs 



A ■ F p f red (z,T) ■ 10 



-0As { . 



(Al) 



where F* Tcd (z, T) is the flux predicted for a template T at 
redshift z, Fl hs is the observed flux, al hB is the associated 
error, / refers to each specific filter and Sf is the zero-point 
offset. The photo-z is estimated from the x 2 minimization 
with respect to the free parameters z, T, and the normal- 
ization factor A; namely, the photo-z is the redshift value 
which minimizes the merit function x 2 ( z ,T, A). The associ- 
ated PDFz, or p(z\(), is derived from (Al), 



(A2) 



There is a main conceptual point in adopting this ap- 
proach. Photo-zs are noisy distance estimates, as opposed 
to spectroscopic redshifts, which are intrinsic or "true" solu- 
tions. Therefore, while (z photo |z S p OCt ro) — ► •^spcctro (i.e. con- 
vergence of the noisy distribution to the true value), it is 
certainly not true that (z spoc tro|z p hoto) -*■ z photo . This is 
equivalent to say that the distribution p{z\Q, obtained by 
binning horizontally the plane [z p hoto, z spC ctro] shown in the 
left panel of Figure 3, is biased by definition. Hence, it is 
more meaningful to estimate p(C\z) rather than attempting 
to derive p{z\C,). 

However, since current photo-z codes output p(z|C) 
rather thanp(£|z), one may wonder if we can apply our tech- 
nique using the PDFz. In effect, our deconvolution method 
relies on Bayes's theorem. For example, if we consider the 
redshift distribution, application of this theorem yields 



p(z, C) = N(z) ■ p(C\z) = p«, z) = Af({) ■ p(z|C). 



(A3) 



From the previous relation, it is immediate to show that 



N(z) = / jV(C)p(z|C) dC- 



(A4) 



In this respect, the true (spectroscopic) redshift distribution 
can be alternatively viewed as a convolution of the noisy 
photo-z distribution times the PDFz. Therefore, assuming 
that the PDFz is known from the output of photometric red- 
shift codes and given the observed photo-z distribution, one 
can obtain the intrinsic N(z) by simply performing the inte- 
gration (A4). This idea is further explored in Sheth & Rossi 
(2009), where examples of these calculations are presented 
using the SDSS sample described here. 

Similarly, one can obtain the magnitude distribution 
with a direct integration, since 
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N(M) = J Af(M) p(M\M) AM , (A5) 

and in principle recover scaling relations as well. However, 
we would like to remind the reader that when dealing with a 
real dataset, a noisy observation needs to be "deconvolved" 
into an intrinsic signal, namely from A/"(C) one needs to re- 
construct N(z); p(z\Q is usually not known while p((\z) can 
be inferred reliably with a proper "spectroscopic training 
set" - this is our main motivation for providing a deconvo- 
lution approach. 



